Goto

Collaborating Authors

 curriculum policy


Synthetic experiments (R2, R4)

Neural Information Processing Systems

Teacher learning curve for Frozen lake: the student return induced by the teaching policy at the end of the curriculum improves as CISR trains more students. For CISR, we evaluate a teacher policy trained w/30 students on new test students, while Bandit learns by explore-exploit for each student as [27] can't learn from previous students. Thank you for your helpful comments! Using multiple students enables CISR's key novelty - allowing the teacher to learn This makes CISR applicable,e.g., in a flavor of sim-to-real transfer where a curriculum policy is learned in Thus, we have at least 270 possible curricula. CISR determines a good one after only 10 students attests to its learning ability.





Safe Reinforcement Learning via Curriculum Induction

Turchetta, Matteo, Kolobov, Andrey, Shah, Shital, Krause, Andreas, Agarwal, Alekh

arXiv.org Artificial Intelligence

In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly. In such settings, the agent needs to behave safely not only after but also while learning. To achieve this, existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations during exploration with high probability, but both the probabilistic guarantees and the smoothness assumptions inherent in the priors are not viable in many scenarios of interest such as autonomous driving. This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor that saves the agent from violating constraints during learning. In this model, we introduce the monitor that neither needs to know how to do well at the task the agent is learning nor needs to know how the environment works. Instead, it has a library of reset controllers that it activates when the agent starts behaving dangerously, preventing it from doing damage. Crucially, the choices of which reset controller to apply in which situation affect the speed of agent learning. Based on observing agents' progress, the teacher itself learns a policy for choosing the reset controllers, a curriculum, to optimize the agent's final policy reward. Our experiments use this framework in two environments to induce curricula for safe and efficient learning.


Learning Curriculum Policies for Reinforcement Learning

Narvekar, Sanmit, Stone, Peter

arXiv.org Artificial Intelligence

Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task. Automatically choosing a sequence of such tasks (i.e. a curriculum) is an open problem that has been the subject of much recent work in this area. In this paper, we build upon a recent method for curriculum design, which formulates the curriculum sequencing problem as a Markov Decision Process. We extend this model to handle multiple transfer learning algorithms, and show for the first time that a curriculum policy over this MDP can be learned from experience. We explore various representations that make this possible, and evaluate our approach by learning curriculum policies for multiple agents in two different domains. The results show that our method produces curricula that can train agents to perform on a target task as fast or faster than existing methods.